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Evaluation of an Arabic version of Children's Self-report Social Skills Scale 
(CS 4 ) based on Item Response Theory 

Abstract 

The present study examined the psychometric properties of the Arabic 
version of Children's Self-report Social Skills Scale (CS 4 ) using a generalized 
partial credit model (GPCM). Data from 722 primary school children (401 boys 
and 321 girls) responses, in Egypt, were analyzed using GPCM. The results 
indicated that the 21 items are able to discriminate among the levels of children's 
social skills. The item and test information functions indicated that the three 
subscales were more informative for low and medium levels of the social skills. 
Items 6, 13, and 17 showed a poor fit to the GPCM, and these items could be 
removed to improve the psychometric properties of the CS4. Overall, current 
findings suggest that evaluation of social skills among Egyptian elementary 
school children using the 21-iitem of the CS 4 may usefully choose to focus on 
items that performed well in these IRT analyses. 
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Introduction 



Today, Schools are held accountable for improving social competencies 
as well as raising academic achievement to prepare students for successful 
adulthood and citizenship. Therefore, one of the purposes of elementary 
education is to develop social skills among children because social skills are just 
as important as academics (Rashid, 2010). 

Social skills are defined as communicating, understanding other people, 
acting according to social environments, making friends, displaying acceptable 
behaviors, expressing oneself, dealing with problems and establishing a good 
relationship with the environment (Samanci, 2010). Social skills also defined as 
understanding both one's own and other individuals' feelings, thoughts and 
behaviors related to various interactions, and behaving according to that 
understanding (Sahin, 2010). 

Akkok (as cited in Sahin, 2010) has classified social skills in six groups. 
These are: (1) skills for initiating the relationship and continuing; (2) skills for 
teamwork; (3) feeling-oriented skills; (4) skills for coping with aggressive 
behaviors; (4) skills for coping with stressful situations, and (5) skills for 
problem solving and planning. 

Social skills are a central part of learning, playing, and behaviors such as 
sharing, helping, initiating communications, and requesting help from another 
person, and giving compliments. 

Children who lack important social skills often are rejected by their 
peers, have trouble interacting with their teachers and families, and have 
emotional difficulties. Furthermore, social skill deficits are related to poor 
academic performance, and frequently associated with children exhibiting 
externalizing disorders such as delinquency and conduct disorder, as well as 
those with internalizing disorder like depression and anxiety (Warnes, Sheridan, 
Geske, & Warnes, 2005). Impairments in social skills are related to a broad 
range of problems including juvenile delinquency, ADHD, developmental 
disabilities, social isolation and withdrawal, aggression and antisocial behavior, 
mental health problems, and dropping out of schools (Matson & Wilkins, 2009). 

By contrast, children whose levels of social skills are high represent kids 
who are able to adjust to their environment, succeed in avoiding conflict, and 
maintain good communications with others (Cummings, Kaminski, &Merrell, 
2008). Children who develop adequate social skills tend to exhibit fewer 
problems with adults and peers, and better adjustment in society (Shahim, 2004). 

Several theories highlighted the interdependency of emotional and social 
competence (e.g., Denham, 2007; Rose-Krasnor, 1997). Social interactions and 
relationships are guided even defined by the emotional transactions within them. 

On the other hand, Emotional competence is crucial to children’s ability 
to interact and form relationships with others. Knappmeyer, Thornton, and 
Bulthoff (2003) found a link between a decreased ability to recognize emotion 
and social dysfunction. 

Socially competent children should be more effective in recognizing 
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emotions in others and in themselves, regulating their own emotional experience, 
and sympathizing with the emotions of the peers. Conversely, children who are 
emotionally troubled lack the scale to establish and sustain successful 
relationships with their peer and teachers (Yukay-Yuksel, 2009). 

Social Skill Assessment 

Identification and treatment of children with low social skills are 
important tasks for educators, psychologists, and mental health 
professionals. Screening and assessment are essential foundations for effective 
intervention in social-behavioral problem of children and youth. Without the 
careful Identification, classification, and selection that should be a part of a good 
assessment, social behavior intervention is likely to be haphazard and 
disorganized at best and ineffective at worst (Merrell, 2001). 

Elliott and Gresham (1987) provided a heuristic framework for 
conceptualization the assessment of children’s social skills that include: (a) 
teacher, parent, and student ratings, (b) teacher, parent, and student interview, (c) 
observation, (d) behavioral role playing, and (e) sociometric techniques. Merrell 
(2001) mentioned that there are six primary methods of gathering assessment 
information about children’s social skills: behavioral observation, behavioral 
rating scales, interviewing, self-report instruments, projective expressive 
techniques, and sociometric techniques. Matson and Wilkins (2009) classified 
the methods of assessing social skills into two primary approaches: Standardized 
role-play scenes and tests of social behavior with a list of items that can be 
scored in Likert's format. 

Although all of these aforementioned methods have some advantages and 
disadvantages, self-report inventories are one of the most straightforward ways 
of measuring social skills (Elliott, Busse, & Gresham, 1993). Advantages of 
child self-report measures include the following: (a) The instruments are 
generally inexpensive in terms of administration time, and can be easily 
administrated in a wide variety of settings such as schools (Beitchman & 
Corradini, 1988), (b) assessing feelings and tendencies over a wide range of 
unobservable social behaviors and situations (Segrin, 2000), and (c) meaningful 
information -the child’s perception and cognitions- is provided that is not 
otherwise accessible to other reporters. 

In USA a number of scales to measure children’s skills have been 
developed. The Children’s Self-Report Social Skills CS 4 ; (Danielson & Phelps, 
2003) is one of these scales and is frequently used in the USA. However, this 
scale is not well known in Arabic countries, and it has not been evaluated with 
an Arabic population. Furthermore, there have been a few scales to measure 
children’s social skills developed in Arab countries, in general, and in Egypt, in 
particular. 

The Arabic version of Riggio’s Social skill Inventory (SSI) which was 
translated by Samadoni (1991) is one of the most common scales that measures 
social skills. The SSI is a 90-item instrument in which subjects rate themselves 
on both positive and negative social behaviors. The scale applies to subjects with 
14 years and over reading at or above the eighth grade level. Three 
disadvantages are associated with the SSI: The psychometric properties of the 
instrument were estimated through Classical Test Theory (CTT), which is a 
sample dependent; the instrument is too long to use as screening instrument; and 
the original version of the instrument is over 20 years old. 

Item Response Theory versus Classical Test Theory 

Although the CS 4 has indeed been a promising instrument, it is noteworthy 
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that extant work has principally evaluated this instrument using CTT 
methodologies (Danielson and Phelps, 2003; Gengdogan, 2008). CTT has a 
number of limitations, including item-dependent estimates, and unconditional 
standard error of measurement (see Magno, 2009; Rouse, Finger, and Butcher, 
1999). 

To address these types of concern, Item Response Theory (IRT) has proven 
valuable advantages over CTT. IRT offers some unique advantages included: (a) 
detailed description of the performance of individual items, (b) indices of item 
and scale precision that are free to vary across the full range of possible scores, 
(c) assessments of item and test level bias with respect to demographic 
subgroups, (d) measures of response-profile quality, and (e) computer-adaptive 
testing, which can dramatically reduce testing time (Hall, Hidalgo, Tomas- 
Sabado , 2007; Reeve, Hays, and Chang, 2007). 

Indeed, IRT has been successfully employed to refine the assessment of a 
number of psychological instruments (e.g., Becker, Schwartz, Saris-Baglama 
,Kosiniski, and Bjomer, 2007;Cooke,Kosson, and Michie, 2001; Gomez, 
Hidalgo, and Tomas-Sabado, 2007;Hall, Reise, and Haviland, 2007; Takegami et 
al., 2009; Zvolensky, Strong, Vujanovic, and Marshall, 2009). 

With this background, the aims of the present study were to evaluate 
the psychometric properties of CS 4 by using both the CTT and the IRT models, 
and to underline the differences and similarities between the two models. 

Method 



Participants 

Participants were 722 from 5th and 6th grade students attended public 
elementary schools in Alexandria, Egypt. Of the 722 students, 401 were boys, 

321 girls, with 403 children in the 5th grade (266 boys and 177 girls), and 319 in 
the 6th grade (175 boys and 144 girls). The mean and standard deviation of the 
male and female participants' age were (11.52 and 0.652), and (11.40, and 0.700) 
respectively. 

Measures 

4 4 

Children’s Social skills Scale CS : The CS (Danielson and Phelps, 2003) is 
a 21 -item measure in which children are asked to rate own social behavior on a 
5-point Likert-type scale ( 1 =never, 2= hardly ever, 3=sometimes, 4=most of the 
time, and 5=always). For the seven items that are framed to measure poor skills 
(e.g., speaking too loudly), points awarded to a response are reverse scored. 
Points are added to obtain a total score, with high scores representing greater 
social skills. The range of the possible scores is 21 to 105. 

Emotion Awareness Questionnaire for Children Revised The EAQC-R 
(Rieffe, Oosterveld, Miers, Terwogt, & Ly, 2008), a modification version of 
EAQC (Rie et al.,2007). The EAQC-R aims to identify how children and adolescents 
feel and think about their feelings. It was designed with a six-factor structure describing 
six aspects of emotional functioning: (1) Differentiating Emotions, (2) Verbal Sharing, (3) 
Not-Hiding Emotions, (4) Bodily Awareness of Emotions, (5) Attending to Others’ 
Emotions, and (6) Analysis of Emotions. Respondents were asked to rate the degree to 
which each item was true about them on a three-point scale (l=not true, 2=sometimes true, 
3=often true). Reliability coefficients of the six subscales of the English version varied 
between 0.63 and 0.68. Similarly, the six subscales of the Arabic version ranged 
from 0.70 to 0.75. 

Translation Procedure 

The original English version of the CS 4 was first translated into the 
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Arabic language by the author of the present study. The initial translation was 
checked by a psychologist and English language expert. Based on their 
comments wording adjustments were made. The Arabic version was then back 
translated into English by a bilingually fluent researcher who had not seen the 
original English version. The two versions were compared, and minor wording 
adjustments were made based on this step. 

Item response Theory Model 

When item responses are coded into more than two ordered categories, 
polytomous ordered response models are appropriate in order to know the test 
psychometric properties. The graded response model of Samejima (1969) and 
Muraki (1992) generalized partial credit model (GPCM) are item response 
models that can be applied to Likert-type scaled items. However, the GPCM 
offers great flexibility and potential benefits when working with polytomous 
items (Fox, 1999; Masters, 1988). 

On the other hand, Edelen and Reeve (2007) believe that the choice between 
these two models is somewhat arbitrary, as they generally produce nearly 
identical results, albeit with slightly different parameterizations, and choosing 
one of these two models over the other tends to be primarily a result of personal 
preference and familiarity with software (PARSCALE is set up to estimate the 
GPCM more easily, whereas MULTILOG favors the GRM). 

In the current study, GPCM was used to calibrate the CS 4 . The GPCM 
estimates two main parameters: the slope or discrimination parameter, and the 
threshold or step difficulty parameters which are associated with the transition 
from one category to the next, the number of step difficulty parameters equals to 
the number of categories minus one (Embretson & Reise, 2000). 

Results 



Results of Classical Test Theory 

As shown in Table 1, the overall mean of the girls is slightly higher than the 
corresponding mean of boys. On the other hand, the means of boys and girls on 
Social Rules and Likeability are very close, whereas the mean of Social 
Ingenuousness are smaller for girls than boys indicating that boys are more 
Social Ingenuousness than girls. The variability of the three scales is very similar 
across boys and girls. The reliability coefficients of the three subscales and the 
total scores are significant and fell above the criterion 0.70 indicating that the 

4 

scale and the three subscales are reliable for boys and girls. The items of the CS 
were also evaluated through calculation of correlation coefficients between the 
score on each item and the total score. For all the items, except item 15, the 
correlation coefficients were significant and greater than 0.30, which met 
Danielson and Phelps’ (2003) criterion for good item-scale correlation. Only 
item 15 (r=0.16) failed to meet this criterion, although this correlation was still 
statistically significant. 

Evaluation of the Model Assumption 

The assumptions of unidimensionality and local independence were evaluated. The two 
concepts are related. Therefore, a data set is unidimensional when the item response is 
locally independent based on a single latent trait. Because of the five -point ordinal scale of 
the items, a polychoric correlation, which measures the linear relationship between two 
observed discrete variables, was computed using PRELIS 2.5.4. 

4 

Because the CS was used for the first time in the Arabic culture, 
exploratory factor analysis was used, and the number of factor was not fixed. 
Three criteria were used to determine the number of factors: (1) the Eigenvalue 
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CS subscales 




Means 




Standard 


Deviation 


Alpha Coefficient 


Boys 


Girls 


Boys 


Girls 


Boys 


Girls 


Social rules 


39.58 


39.99 


6.69 


6.59 


0.77 


0.80 


Likeability 


15.99 


15.88 


3.02 


2.93 


0.75 


0.76 


Social Ingenuousness 


14.75 


12.83 


3.75 


3.57 


0.74 


0.72 


Total Score 


80.32 


82.54 


10.40 


10.50 


0.80 


0.84 



should be larger than one; (2) the scree test; and (3) the content of the factor. 
Eigenvalues greater than one is the most common criterion but often results 
in too many factors. Therefore, as a second criterion, the scree test was used, a 
graph of the Eigenvalues for each factor in which one look for breaks in the 
graph. Items with an absolute factor loading smaller than 0.40 in the pattern 
matrix of the principal axis analysis with Promax rotation were excluded. 

SPSS 18 was used to explore the factorial structure of CS 4 . As shown in 
Table 2, for both boys and girls, eigenvalue criterion extracted three factors, 
and their Eigenvalues were greater than one (boys: 3.46, 1.96, 1.43, and 
girls:3.77, 2.00, 1.41). Similarly, the Scree Plot test showed three factors. 

The three factors explained 31.97% of the variance for boys, and 34.17% 
for girls. The correlations among the three factors indicated that the subscales 
related to each other, but the size of the correlation coefficients, which ranged 
from 0.13 to 0.54, showed that the subscales measured separate constructs. The 
facts that all, but one item, loaded with a factor loading greater than 0.40 on 
one and only one CS 4 factor provided evidence on the unidimensionality, 
which supported our use of IRT to psychometrically describe the scale, of 
each sub sc ale. 

For each subscale, the unidimensional generalized partial credit model 
(GPCM), which offers better understanding about how each item and subscale 
measure children’s social skills, had been fitted using marginal maximum 
likelihood estimation procedures implemented in PARSCALE 4.1 (Muraki & 
Block, 2003). 

Since, it is possible to test directly for each item the model-data fit in 
relation to the GPCM, a likelihood-ratio X2 which indicates the goodness of fit 
between the expected and observed response frequencies was computed. Non- 
significant difference between the expected and the observed frequencies 
indicates a good fit. 

Construct Validity 

In the current study, the construct validity of CS 4 was investigated through 
computing the correlation coefficients between emotional awareness as 
measured by Emotion Awareness Questionnaire for Children Revised (Rieffe, 
Oosterveld, Miers, Terwogt, & Ly, 2008) and the components of the CS 4 . We 
hypothesized that the positive components of the CS 4 (Social Rules and 
Likeability) will correlate positively with the total score The EAQC-R, whereas 
Social Ingenuousness as a negative component of the CS4 will correlate 
negatively with the total score on the EAQC-R. 

The findings from the present study indicated that correlation coefficient 
between the CS 4 and Emotional Awareness Questionnaire-Revised (EAQC-R) 
was statistically significant for both boys and girls. Pearson’s correlation 
coefficient demonstrated a significant positive correlation between child’s total 
scores on the CS 4 and the scores on the EAQC-R for both boys (r=0.38, p<0.01) 
and girls (r=0.46, p<0.01). This result suggested that higher scores on the self- 
report social skills were associated with higher scores on EAQC-R. 
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Table 2: Pattern Coefficients, Eigenvalues, and Variance accounted 
for by the items on the three components of the Children’s Self-Report 
Social Skills Scale for Boys and Girls 



Components 

Social Rules Likeability Social Ingenuousness 
Boys Girls Boys Girls Boys Girls 



1. 1 look others in the face when 
they talk. 

3. 1 say Thank you when someone 


0.34 


0.30 


does something nice for me. 


0.57 


0.41 


6. 1 take turns with others. 


0.46 


0.48 


9. 1 listen to others when they talk. 


0.43 


0.47 


10. 1 share games and toys with 
others. 


0.44 


0.56 


11. 1 say I am sorry when I hurt 
someone by accident. 


0.58 


0.68 


12. When I see others playing a 
game I would like to play, I ask 
if I can join them. 


0.48 


0.56 


14. 1 say I am sorry when I hurt 
someone on purpose. 


0.42 


0.52 


19. 1 help others when they need 
help. 


0.42 


0.65 


20. 1 ask others to play. 


0.44 


0.45 



2. Others like me and have fun 


0.57 


0.63 


with me. 






7. When I come over, others ask 


0.51 


0.47 


me to move or give them more 






space. 






13. 1 make friends easily. 


0.43 


0.48 


16. Others don not like me. 


0.62 


0.59 


18. Others ask me to play 


0.43 


0.51 



I kick or hit someone else if 



4. make me angry. 










0.75 


0.71 


5. I am bossy. 










0.74 


0.69 


8. I do not play fairly. 










0.44 


0.44 


15.1 walk up to others and start 










0.41 


0.48 


conversation. 

17.1 speak or interrupt if someone 










0.74 


0.45 


else talking. 

21.1 am too loud when I talk. 










0.41 


0.54 


Eigenvalue 


3.46 


3.77 


1.96 


2.00 


1.43 


1.41 


Variance Accounted for 


16.48 


17.93 


9.35 


9.54 


6.14 


6.70 



The correlation coefficients between the three components of the CS 4 (Social 
Rules, Likeability, and Social Ingenuousness) and the EAQC-R were examined. 

The findings indicated that each component of the CS 4 correlated statistically 
significant with the EAQC-R>. More specifically, Social Rules and Likeability 
correlated positively with the EAQC-R (boys: r=0.28, p < 0.01, and girls: r=0.40, 
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p <0.01), (boys: r=0.24, p<0.01, and girls: r=0.20, p <0.01) respectively, 
whereas Social Ingenuousness correlated negatively with EAQC-R (boys: r=- 
0.24, p <0.01, and girls: r=-0.26, p <0.01). 

IR T parameter estimates of three subscales for boys and girls 

Because the CS 4 was found to be a three-factor model, parameters were 
estimated separately for each subscale. GPCM parameter estimates for the three 
subscales are presented in Tables 3 and 4. For GPCM, each item has a single 
slope (a) parameter, and a single location parameter (b), and four step difficulty 
parameters (d;). 

The slope parameter (comparable to discrimination) describes how well the item 
performs in general. Large slope value indicates that the item is good at discriminating 
among the different levels of the latent trait. The difficulty of an item is indicated by the 
location parameter. Thus, a large positive value indicates a difficult item or that few 
examinees respond in the higher categories. A negative value indicates an easy item or that 
few examinees respond in the lower categories. The location parameter functions to shift the 
category parameters up and down the latent trait scale. 

The step difficulty parameter indicates the difficulty of the step in moving 
from one response option to another. For example, if an item has four response 
options, then there will be three steps, namely step 1 for moving from the first 
response option to the second option, step 2 for moving from the second option 
to the third option, and step3 for moving from the third option to the fourth 
option. Higher positive values indicate difficult steps, while low and negative 
values indicate easy steps. Tables 3 and 4 show the estimated item parameters 

4 

for boys and girls respectively, for the three subscales of CS (Social Rules, 
Fikeability, and Social Ingenuousness). The items are evaluated with respect to 
their slope parameter, their threshold parameter, and the step parameters; the 
latter is interpreted as the difficulty associated with a given category compared 
with that of other categories, or the deviation of each category threshold from the 
item location. 

As shown in Tables 3 and 4, for the CS 4 subscales, the values of the category 
parameters of stepl were negative and low relative to steps 2, 3, and 4. Although all the 
category parameter values for steps 3 and 4 were positive, the category parameter values for 
steps 1 and 2 were negative. In general, the category parameter values were monotonically 
increasing from stepl to step 4. These trends were consistent across the three subscales and 
across boys and girls. 

Based on the guidelines suggested by Baker (2001), the items of the three 
subscales for boys and girls show moderate (0.65 to 1.34) and large (1.35 to 
1.69) discrimination values. For Social Rules items, item 3 was the most 
discriminant item for boys, whereas, for girls, item 1 1 had relatively larger value 
than the other items. On the other hand, items 10 and 1 had relatively lower 
discrimination values than other items for boys and girls respectively. 

For Fikeability items, item 13 had relatively larger values, with the other 
items having about the same values for boys. For girls, iteml8 was the most 
discriminating item, followed by item 12. On the other hand, items 2 and 13 
were the lowest discriminating items for boys and girls respectively. 

For Social Ingenuousness items, items 5 and 4 were the most discriminating 
item for boys and girls respectively, whereas item 17 was the lowest 
discriminating item for both boys and girls. 

In relation to location parameters, although the ten items of the Social Rules 
subscales had negative values, item 19 was the easiest item for boys and girls. 
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Table 3: Generalized Partial Credit Model Item Parameter Values for Boys 



items 


Category 
stepl step2 


step3 


step4 


slope 


location 


G2 


df 


P 


Social Rules 


















1 . 


-3.08 


-1.67 


1.41 


3.34 


0.75 


-0.66 


37.41 


33 


0.27 


3. 


-2.06 


-0.42 


0.68 


1.80 


1.26 


-1.39 


17.22 


11 


0.10 


6. 


-2.53 


-1.34 


1.19 


3.69 


0.79 


-1.33 


37.33 


23 


0.03 


9. 


-2.49 


-1.41 


0.57 


3.33 


0.82 


-1.11 


30.62 


21 


0.08 


10 


-6.51 


-4.79 


2.70 


5.59 


0.68 


-3.88 


28.55 


22 


0.16 


11. 


-3.40 


-2.04 


2.64 


2.80 


0.88 


-1.22 


22.19 


16 


0.14 


12 


-1.60 


0.32 


0.64 


1.64 


0.88 


-1.46 


29.55 


23 


0.16 


14. 


-3.22 


-1.13 


1.40 


4.95 


0.76 


-0.58 


37.94 


26 


0.06 


19. 


-5.36 


-2.02 


2.06 


3.32 


0.70 


-4.55 


16.94 


11 


0.11 


20 


-3.17 


0.92 


0.95 


1.30 


0.78 


-1.88 


33.97 


28 


0.20 



Likeability 



2. 


-6.41 


-4.32 


4.54 


6.55 


0.61 


0.00 


28.77 


26 


0.32 


7. 


-3.41 


-2.71 


1.69 


4.88 


0.69 


-2.72 


28.92 


23 


0.21 


13 


-1.11 


-0.64 


0.16 


1.54 


0.88 


-1.06 


48.08 


24 


0.00 


16. 


-4.21 


-2.71 


3.03 


4.55 


0.72 


0.11 


42.21 


35 


0.19 


18. 


-2.40 


-0.47 


0.53 


2.34 


0.82 


-1.47 


30.51 


25 


0.21 



Social Ingenuousness 



4. 


-1.22 


-0.36 


-0.09 


1.67 


0.97 


-0.12 


40.27 


32 


0.15 


5. 


-0.63 


-0.58 


0.25 


0.96 


1.42 


-0.38 


27.99 


19 


0.08 


8. 


-8.06 


-2.09 


3.09 


5.79 


0.72 


-2.41 


17.84 


16 


0.33 


15. 


-2.47 


-2.30 


2.10 


2.54 


0.79 


-0.97 


37.02 


28 


0.12 


17. 


-6.05 


-4.51 


4.52 


8.04 


0.61 


2.79 


59.02 


36 


0.01 


21. 


-6.22 


-3.82 


3.59 


6.44 


0.72 


0.00 


34.94 


28 


0.17 



For the Likeability items, item 16 was the most difficult item for girls and 
boys, whereas items 7 and 2 were the easiest items for boys and girls 
respectively. Finally, for Social Ingenuousness items, iteml7 was the most 
difficult item for boys and girls, whereas item 8 was the easiest items for boys 
and girls. 
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Table 4: Generalized Partial Credit Model Item Parameter Values for Girls 



items 


Category 
stepl step2 


step3 


step4 


slope 


location 


G2 


df 


P 


Social Rules 
1. -5.64 


-1.30 


2.23 


4.53 


0.69 


-0.98 


29.87 


26 


0.27 


3. 


-3.09 


-2.53 


1.50 


4.11 


0.89 


-1.98 


7.01 


3 


0.07 


6. 


-4.35 


-2.95 


3.38 


6.91 


0.74 


0.00 


54.91 


26 


0.00 


9. 


-2.92 


-0.58 


0.99 


2.51 


0.88 


-0.84 


19.24 


17 


0.13 


10 


-3.76 


-0.59 


1.97 


2.38 


0.85 


-1.41 


21.28 


17 


0.21 


11. 


-0.82 


-0.21 


0.03 


1.01 


1.55 


-1.39 


18.57 


11 


0.07 


12 


-8.69 


-7.01 


7.21 


8.84 


0.70 


-0.11 


37.22 


26 


0.07 


14. 


-3.67 


-0.97 


1.56 


3.38 


0.76 


-0.84 


29.21 


23 


0.17 


19. 


-1.47 


-0.10 


0.68 


0.90 


1.03 


-1.62 


16.09 


12 


0.19 


20 


-2.99 


-0.38 


0.82 


2.55 


0.81 


-1.41 


26.59 


22 


0.23 


Likeability 
2. -1.06 


-0.31 


0.22 


1.16 


1.01 


-1.20 


21.33 


17 


0.21 


7. 


-6.36 


-3.76 


2.50 


7.61 


0.70 


0.60 


17.52 


15 


0.29 


13 


-6.21 


-3.87 


3.39 


7.68 


0.60 


0.00 


52.71 


27 


0.00 


16. 


-5.73 


-4.64 


3.64 


6.21 


0.62 


2.83 


28.21 


24 


0.25 


18. 


-1.01 


-0.50 


-0.10 


1.61 


1.30 


-1.04 


21.78 


19 


0.30 


Social Ingenuousness 
4. -0.56 -0.34 


0.36 


0.54 


1.26 


-0.69 


18.45 


14 


0.19 


5. 


-1.86 


-0.88 


0.52 


2.22 


0.90 


-1.31 


16.13 


14 


0.31 


8. 


-2.24 


-1.23 


1.24 


3.32 


0.80 


-2.11 


18.72 


13 


0.13 


15. 


-2.26 


-0.75 


0.87 


1.98 


0.79 


-0.81 


24.35 


20 


0.23 


17. 


-6.16 


-4.04 


3.52 


7.52 


0.72 


2.79 


50.59 


27 


0.00 


21. 


-0.67 


-0.34 


-0.18 


1.19 


0.80 


-2.08 


26.85 


18 


0.08 



Item-Fit Statistics 

There is not widely accepted goodness-of-fit statistic for polytomous IRT 
models. Several strategies can be used (Embretson & Reise, 2000). In the 
current study, the residuals by item and response category were calculated to 
assess the fit. Moreover, the log-likelihood item-fit Chi-square statistic (G2) 
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was calculated using the PARSCALE computer program. Although this test 
is highly sensitive to sample size (Gomez et al., 2007), and probably should not 
be treated as a solid decision-making tool. As well, G2 statistic is sensitive to 
the number of intervals into which the ability continuum is divided, with a high 
number of intervals, the values of this statistic maybe artificially high (Muraki, 
1992). 

Tables 3 and 4 display the G2 statistics for each item, their associated degree 
of freedom (df), and their probability value (p). The results were consistent 
across boys and girls, the same three items, one from each subscale, present 
asset value of chip-square, which were significant(p <0.05), indicating their 
poor fit with the model for boys and girls. For Social Rules items, item 6 shows 
a poor fit, whereas item 13 shows a poor fit for Likeability, iteml7 shows a poor 
fit for Social Ingenuousness. 

For boys, the overall goodness-of-fit statistics of the three subscales 
(Social Rules, Fikeability, Social Ingenuousness) were G 2 =(305.72, 535.12, 
237.08), with df=(214, 386, 159), and p < 0.001 respectively. Similarly, for girls 
they were G 2 = (305.72, 535.12, 237.08), with df = (214, 386, 159), and p < 

0.001 indicating a not good overall fit to the data. Moreover, Chi-Square Ratio 
Test (G 2 /df): A significant chi-square value relative to the degrees of freedom 
indicates that the observed and estimated matrices differ. Statistical significance 
indicates that this difference is due to sampling variation. A chi square/degrees 
of freedom ratio value of 2-3 will be interpreted as suggesting a plausible model 
(Carmines & Mclver, 1981).The chi-square/degrees of freedom statistics for the 
three subscales for boys (1.43, 1.39, 1.49) and girls (1.51, 1.29, 1.43) were <2 
indicating a good fit. 

Item Information and Test Information Functions 

The Item Information Function (IIF) shows trait levels where the item has 
more precision and reliability; that is, it indicates in which trait levels the item is 
most informative. The notion of reliability from classical test theory (CTT) is 
analogous to item response theory information (IRT). In CTT, however, 
reliability is summarized in a single coefficient that represents the average 
precision across all examinees. In IRT, information function plots show the 
varying precision of the ability estimate across the trait continuum. 

IIF is mainly based on the value of the item discrimination, as the item slope 
increases, the IIF of this item increases items. Therefore, the three most 
discriminative items were the most informative items. For boys, items 3, 13, and 
5 were the most informative items, whereas items 10, 2, and 17 were the least 
informative items for Social Rules, Fikeability, and Social Ingenuousness 
respectively. For girls, itemsll, 18, and 4 were the most informative items, 
whereas items 6, 13, and 17 were the least informative items for Social Rules, 
Fikeability, and Social Ingenuousness respectively. The most informative items 
are the most discriminating (having highest slope parameter values), whereas the 
least informative items are the least discriminating (having lowest slope 
parameter values). 

The Test Information Function (TIF) is the sum of value of IIF of subscale items. 
While the number of items of the subscale increases the value of the TIF of the subscale or 
the subscale increases. In the present study, TIF was calculated for each subscale. These 
graphical representations show the trait levels where the tests are most discriminating. They 
also provide an overall standard error of measurement (SEM) given by the inverse square 
root of the TIF at each trait level. In other words, the TIF indicates for each level of the 
latent trait the amount of precision expected when estimating a subject trait. For example, if 
a scale designed to measure the social skills has a low SEM at the upper end of the trait 
continuum and a high SEM at the lower end, then the clinician using the scale is aware that 
high scores on the scale are fairly accurate estimates of the subject’s social skills. 
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Therefore, it is appropriate to use this instrument to detect subjects with high trait 
levels. In contrast, when this questionnaire is used with subjects with low trait levels, it will 
provide inaccurate estimates of these levels. That is, the instrument will not be able to detect 
differences between subjects with medium and below medium trait levels. Thus, the TIF 
serves as a guide in deciding when and for what purpose a given instrument may be best be 
applied. 

For both boys and girls, the TIF of each subscales shows that the Social Rules subscale 
(consists of the largest number of items) provides the most amount of information along the 
trait continuum. However, the subscale of Social Rules is most informative at medium or 
low levels of Social Rules. The Social Ingenuousness was the second informative subscales 
for boys and girls. Similarly, the Social Ingenuousness provides the most amount of 
information at low levels of Social Ingenuousness. Finally, Likeability was the least 
informative subscale, although it was higher for girls than boys. The overall conclusion is 
that the three subscales are more accurate in estimating the low trait levels of social skills, 
whereas, they are inaccurate in estimating the high levels of social skills for boys and girls. 

Discussion 

The aim of present study was to utilize GPCM to evaluate the psychometric properties 
of the Arabic version of CS 4 for boys and girls. TO compare the results from the present 
study to the results from the previous ones (Danielson and Phelps, 2003; Gcncdogan, 
2008), the psychometric properties of the CS 4 were also evaluated based on the classical 
test theory. 

The psychometric properties of CS 4 based on CTT in the present study were 
very comparable with the results from the English version (Danielson & Phelps, 
2003), and the Turkish version (Gcncdogan, 2008). In present study, the internal 
consistencies of the CS 4 total score and the three subscales were acceptable, and 
they were close to the ranges that were obtained from the original English 
version (Danielson & Phelps, 2003). The exploratory factor analysis revealed 
that CS 4 consists of three factors. This finding is consistent with the Danielson 
and Phelps' three-factor structure model. These findings provide a preliminary 
indication of the stability of the CS 4 across languages and countries. Such 
stability suggests that the CS 4 accurately captures the structure of the children’s 
social skills regardless of the cultural differences. 

On the other side, the main difference between the results from the present 
study and the previous ones was that two items from the Social Rules scale 
loaded on Social Ingenuousness scale. Items 4 ("I kick or hit someone else if 
they make me angry"), and 5 ("I am bossy") loaded differently from the English 
and Turkish versions. The inspection of the content of the two items indicated 
that they are more related with Social Ingenuousness subscale than Social Rules 
subscale. 

The CS 4 showed an adequately fit to the generalized partial credit model. Thus, for boys 
and girls and across the three subscales, only three out of the 21 items presented a poor fit to 
the model. Embretson and Reise (2000) noted that, in general, some sources of a poor item- 
fit may be: (a) multidimensionality, (b) a failure to estimate enough item parameters(e.g., 
when polytomous Rasch model is fitted to data that had slope parameter variations), (c) 
nonmonotonicity of item-trait relations, or (d) poor item construction. 

In this study, the poor fit of these three items is more likely related to issues of 
nonmonotonicity and wording. The Item Characteristic Curves (ICCs) of two (13 and 17)of 
three items are flat. Thus, as the trait increases the probability of endorsing the options are 
constant. The two items (item 13 "I make friends easily", and item 17 "I speak or interrupt, if 
someone else is talking") imply high social desirability for Arabian children. On the other 
hand, item 6 ("I take a turn with others") are worded in a way that is perhaps abstract and 
ambiguous, and may be difficult to interpret for some children. Thus, poor fit items should 
be revised in content and may be excluded from the test, if they do not lead to a narrow 
definition of the trait being measured. 

In relation to slope, both boys and girls had acceptable values for all the Social 
Rules, Likeability, and Social Ingenuousness items. Thus all the CS 4 items were generally 
good for discriminating their respective traits for boys and girls. Despite this, there was 
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notable variability in discrimination abilities across the items within scales, and across boys 
and girls. 

The present study found that for all items for boys and girls, the category parameter 
values for steps 1 and 2 were smaller than the values of steps 3, and 4. Thus findings suggest 
that for all items, moving from endorsing response options 0 tol and 1 to 2 is more likely 
than moving from endorsing response 2 to 3 or from 3 to 4. This study also found that the 
location parameter values for eighteen items out of the twenty one items were negative, 
thereby indicating that these items are easy. This implies that endorsement of higher ratings 
of the items would require that small amount of the relevant traits (Social Rules, Likeability, 
and Social Ingenuousness items) be present. 

The findings from this study indicated that the three subscales of the CS 4 
had peaked information curves. Consequently, their precision differs markedly within their 
respective trait. In general, the three subscales of the CS 4 tend to perform best for children 
at low and moderate levels of social skills. This may be exactly what one desire from social 
skill measure; however, this distinction could be made with fewer items on each subscale. 
Computing Item Information Curves, as done in this study, would allow researchers to 
identify exactly how many items per scale are needed to achieve a given level of precision 
within specific trait range. 

In conclusion, this study has shown that the use of IRT procedures can 
provide valuable additional psychometric information over classical test theory. 

It is well documented that good CTT based psychometric properties for a 
measure do not necessarily mean that it would have good IRT based 
psychometric properties. This has to be demonstrated using IRT procedures. This 
study has also demonstrated how IRT can be used to revise existing measures. It 
is hoped that this study has shown the value of using IRT to evaluate the 
psychometric properties of measures and for test development and revision, and 
that it will encourage other researchers to use IRT approaches for similar 
purposes. 
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